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Abstract. We consider the inverse problem of estimating an unknown function u from noisy 
measurements yofa, known, possibly nonlinear, map Q applied to u. We adopt a Bayesian approach to 
the problem and work in a setting where the prior measure is specified as a Gaussian random field fig . 
We work under a natural set of conditions on the likelihood which imply the existence of a well-posed 
posterior measure, /i^. Under these conditions we show that the maximum a posteriori (MAP) 
estimator is well-defined as the minimiser of an Onsager-Machlup functional defined on the Cameron- 
Martin space of the prior; thus we link a problem in probability with a problem in the calculus of 
variations. We then consider the case where the observational noise vanishes and establish a form 
of Bayesian posterior consistency. We also prove a similar result for the case where the observation 
of G{u) can be repeated as many times as desired with independent identically distributed noise. 
The theory is illustrated with examples from an inverse problem for the Navier-Stokcs equation, 
motivated by problems arising in weather forecasting, and from the theory of conditioned diffusions, 
motivated by problems arising in molecular dynamics. 



1. Introduction. Consider a centred (mean-zero) Gaussian measure /ip on a Ba- 
nach space (X, || • with Cameron-Martin space (i?, (•, •) e, \\ ■ \\e), and a measure 
fj, which has a density with respect to fiQ- 

-^(w) ocexp(-$(w)). (1.1) 

Measures ^ of this form arise naturally in a number of applications, including the 
theory of conditioned diffusions |16j and the Bayesian approach to inverse problems 
P5] . In these settings there are many applications where $ : X — >■ E is a locally 
Lipschitz continuous function and it is in this setting that we work. Our interest is in 
defining the concept of "most likely" functions with respect to the measure /i, and in 
particular the maximum a posteriori estimator in the Bayesian context. We will refer 
to such functions as MAP estimators throughout. We will define the concept precisely 
and link it to a problem in the calculus of variations, study posterior consistency of the 
MAP estimator in the Bayesian setting, and compute it for a number of illustrative 
applications. 

To motivate the form of MAP estimators considered here we consider the case 
where X is finite dimensional and the prior fiQ is Gaussian A/'(0,Co). This prior has 
density exp{— ^\Cq^ u\^) with respect to the Lebesgue measure where | • | denotes the 
Euclidean norm. The probability density for /i with respect to the Lebesgue measure, 



given by (1.1), is maximised at minimisers of 

I{u):=^u) + ^\\u\\l (1.2) 

— 1/2 

where || • = |Cg u\ is the Cameron-Martin norm of the Gaussian measure /zq. 
We would like to derive such a result in the infinite dimensional setting. The main 
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technical difficulty that is encountered stems from the fact that the Cameron-Martin 
space has measure zero with respect to the distributions /ig and /i. 

The natural way to talk about MAP estimators in the infinite dimensional setting 
is to seek the centre of a small ball with maximal probability, and then study the limit 
of this centre as the radius of the ball shrinks to zero. To this end, let B^{z) C X be 
the open ball of radius 6 centred aX z £ X. If there is a functional /, defined on E, 
which satisfies 



for all zi, Z2 £ E d X , then / is termed the Onsager-Machlup functional [11] [19]. For 
any fixed Zi , the function Z2 for which the above limit is maximal is a natural candidate 
for the MAP estimator of fi and is clearly given by minimisers of the Onsager-Machlup 



function. In finite dimensions it is clear that / given by ( 1.2 1 is the Onsager-Machlup 
functional. We will generalize this result to the infinite dimensional setting and show 
that the MAP estimators, which we define using centres of shrinking small balls with 
maximal probability, are characterised by minimisers of functional /. 

When the probability measure arises from the Bayesian formulation of inverse 
problems, it is natural to ask whether the MAP estimator is close to the truth un- 
derlying the data, in either the small noise or large data limits. This is a form of 
Bayesian posterior consistency, here defined in terms of the MAP estimator only. We 
will study this question for finite observations of a nonlinear forward model, subject 
to Gaussian additive noise. 

The paper is organized as follows: 

• in section [2] we detail our assumptions on $ and /xq; 

• in section [3] we give conditions for the existence of an Onsager-Machlup func- 
tional / and show that the MAP estimator is well-defined as the minimiser 
of this functional; 

• in section |4] we study the problem of Bayesian posterior consistency by study- 
ing limits of Onsager-Machlup minimisers in the small noise and large data 
limits; 

• in section [5] we study applications arising from data assimilation for the 
Navier-Stokes equation, as a model for what is done in weather prediction; 

• in section [6] we study applications arising in the theory of conditioned diffu- 
sions. 

We conclude the introduction with a brief literature review. We first note that the 
functional / in ( |1.2[ ) resembles a Tikhonov-Phillips regularization of the minimisation 
problem for $ (12j . with the Cameron-Martin norm of the prior determining the 
regularization. In the theory of classical non-statistical inversion, formulation via 
Tikhonov-Phillips regularization leads to an infinite dimensional optimization problem 
and has led to deeper understanding and improved algorithms. Our aim is to achieve 
the same in a probabilistic context. One way of defining a MAP estimator for /i given 
by ( |1.1[ ) is to consider the limit of parametric MAP estimators: first discretize the 
function space using n parameters, and then apply the finite dimensional argument 
above to identify an Onsager-Machlup functional on M". Passing to the limit n — >■ oo 
in the functional provides a candidate for the limiting Onsager-Machlup functional. 
This approach is taken in [531 IMl HZj for problems arising in conditioned diffusions. 
Unfortunately, however, it does not necessarily lead to the correct identification of the 
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Onsager-Machlup functional as defined by (1.3 1. We study the problem directly in 
the infinite dimensional setting, without using discretization, leading, we believe, to 
greater clarity. Adopting the infinite dimensional perspective for MAP estimation has 
been widely studied for diffusion processes [9 and related stochastic PDEs [2^; see 
[30] for an overview. Our general setting is similar to that used to study the specific 
applications arising in the papers [9l I29[ I30j . By working with small ball properties 
of Gaussian measures, and assuming that $ has natural continuity properties, we are 
able to derive results in considerable generality. There is a recent related definition 
of MAP estimators in [T7] , with application to density estimation in [T3] . However, 
whilst the goal of minimising / is also identified in |17j . the proof in that paper is 
only valid in finite dimensions since it implicitly assumes that the Cameron-Martin 
norm is /ip— a.s. finite. In our specific application to fluid mechanics our analysis 
demonstrates that widely used variational methods [5j may be interpreted as MAP 
estimators for an appropriate Bayesian inverse problem and, in particular, that this 
interpretation, which is understood in the atmospheric sciences community in the 
finite dimensional context, is well-defined in the limit of infinite spatial resolution. 

Posterior consistency in Bayesian nonparametric statistics has a long history |13] . 
The study of posterior consistency for the Bayesian approach to inverse problems is 
starting to receive considerable attention. The papers [201 H] are devoted to obtaining 
rates of convergence for linear inverse problems with conjugate Gaussian priors, whilst 
the papers [4l[2l] study non-conjugate priors for linear inverse problems. Our analysis 
of posterior consistency concerns nonlinear problems, and finite data sets, so that 
multiple solutions are possible. We prove an appropriate weak form of posterior 
consistency, without rates, building on ideas appearing in [3]. 

2. Set-up. Throughout this paper we assume that (X, || • \\x) is a separable 
Banach space and that /ig is a centred Gaussian (probability) measure on X with 



Cameron-Martin space [E, (•, \\ ■ \\e)- The measure /i of interest is given by ( 1.1 1 
and we make the following assumptions concerning the potential $. 

Assumption 2.1. The function AT — !■ M satisfies the following conditions: 

(i) For every e > there is an M G M, such that for all u Cz X , 

$(u) > M-e\\u\\j^. 

(ii) $ is locally hounded from above, i.e. for every r > there exists K — K{r) > 
such that, for all u ^ X with \\u\\x < r we have 

$(u) < K. 

(Hi) $ is locally Lipschitz continuous, i.e. for every r > there exists L — L{r) > 
such that for allui,U2 G X with \\ui\\ x , \\u2\\ x < r we have 

|$(ui) - $(li2)| < L\\ui -U2\\x- 

From [23 we know that under Assumptions|2.1|the bmction exp(— $) is integrable 



with respect to /io and thus /i given by (1.1) can indeed be normalized to give a 



probability measure /i. Finally, we define a function /: AT — > M by 

^^^^^ , , if ueE, and ^2.1) 

else. 




We will see in section [3] that / is the Onsager-Machlup functional. 

Remark 2.2. We close with a brief remark concerning the definition of the 
Onsager-Machlup function in the case of non- centred reference measure = J\f{m,Co). 
Shifting coordinates by m it is possible to apply the theory based on centred Gaussian 
measure ^q, and then undo the coordinate change. The relevant Onsager-Machlup 
functional can then be shown to be 



I{u) 




u — m|| J; if u — m £ E, and 
else. 



3. MAP estimators and the Onsager-Machlup functional. In this section 



we prove two main results. The first, Theorem 3.2 establishes that / given by (1.2 1 



is indeed the Onsager-Machlup functional for the measure fi given by (1.1 1. Then 



Theorem |3.5| a nd Corollary |3.10[ show that the MAP estimators, defined precisely in 
Definition |3.1[ are characterised by the minimisers of the Onsager-Machlup functional. 

For z e A, let B^{z) C A be the open ball of radius 5 in A. Let 

JHz) = ^i{B\z)) 

be the mass of the ball centred around z S A. We first define the MAP estimator for 
as follows: 

Definition 3.1. Let 

— argmax J''(z). 



Any point z e A satisfying lim5_i.o(J (z)/J (z )) = \, is a MAP estimator for the 
measure /i given by 



We show later on (Theorem 3.5 1 that a strongly convergent subsequence of {z''} 



<5>o 

exists and its limit, that we prove to be in E, is a MAP estimator and also minimises 
the Onsager-Machlup functional /. Corollary |3. 10| then shows that any MAP estima- 
tor z as given in Definition 3.1 lives in E as well, and minimisers of / characterise all 
MAP estimators of /it. 

We first need to show that / is the Onsager-Machlup functional for our problem: 



Theorem 3.2. Let Assumption 2.1 hold. Then the function I defined by (2.1) is 



the Onsager-Machlup functional for ji, i.e. for any zi, Z2 (z E we have 



lim 



exp(/(z2) - /(zi)). 



Proof. Note that J (z) is finite and positive for any z € Ehy Assumptions 2.1 'i),(ii) 



together with the Fernique Theorem and the positive mass of all balls in A, centred 
at points in i?, under Gaussian measure [S]. The key estimate in the proof is the 
following consequence of Proposition 3 in Section 18 of [21]: 



t^o{B'{z,)) 

lim — 7 — — — — 

S^O ^o(S«(z2)) 



exp 



Z2 



Zl 



(3.1) 



This is the key estimate in the proof since it transfers questions about probabiHty, 
naturally asked on the space X of full measure under /zq, into statements concerning 
the Cameron-Martin norm of /iq, which is almost surely infinite under /^o- 

We have 

J'^(zi) _ /B^(^,)exp(-$(u))^o(dw) 

J^{z2) ~ /B.(,^)exp(-$(v))/io(dw) 

_ !b^{z^) exp(-^(^) + Hzi)) exp(-$(zi)) ^io{du) 
~ Ib^(z2) exp(-4'(v) + $(z2)) exp(-$(2;2)) /xo(di;) ' 



By Assumption 2.1 (iii), for any u,v £ X 

-L\\u~v\\x < < L\\u~v\\x 

where L = L{r) with r > max{||u||x, Therefore, setting Li = L(|lzi||x + 5) 

and L2 = L{\\z2\\x + <^), we can write 

■^'(^1) < g5(L,+L.)is^exp(-$(zi))Aio(d^x) 



/B'5(z2)*'^P("*(^2))Aio(dw) 



Now, by (3.1), we have 



Zl 



with ri{5) — ^ 1 as 5 — > 0. Thus 



- < ri(5)e''(^^+-^i)e"-^('^i'+-^('^^) 



lim sup 



J^(zi) 



< e 



-/(Zl)+/(Z2) 



(3.2) 



Similarly we obtain 



J'{z,) 



J'{Z2) - r2{S) 
with r2{S) ^ 1 as S and deduce that 



r;o" j^{z2) 



lim inf 



> e 



-I(zi)+Hz2} 



(3.3) 



Inequalities (3.2) and (3.3) give the desired result. □ 

We note that similar methods of analysis show the following: 



Corollary 3.3. Let the Assumptions of Theorem 3.2 hold. Then for any z £ E 



lim 



J'iz) 



1 



-I{z) 



where Z ~ exp(— $(w)) /io(du). 



Proof. Noting that we consider /i to be a probability measure and hence 

J^{z) _ i/i3^(^)exp(-$(u))Aio(du) 
5(0) Mo(du) 

with Z = J^exp(— /io(du), arguing along the lines of the proof of the above 
theorem gives 

Zr{6) J^s^o^fJ-Q(du) Z 



with L = L{\\z\\x + S) (where L(-) is as in Definition 2.1) and r{5) — > 1 as 5 



The result then follows by taking limsup and liminf as (5 0. □ 



Proposition 3.4. Suppose Assumptions 2.1 hold. Then the minimum of I: E — > 
M is attained for some element z* £ E. 

Proof. The existence of a minimiser of / in E, under the given assumptions, 
is proved as Theorem 5.4 in [28] (and as Theorem 2.7 in [7 in the case that $ is 
non- negative). □ 

The rest of this section is devoted to a proof of the result that MAP estimators 



can be characterised as minimisers of the Onsager-Machlup functional / (Theorem 3.5 



and Corollary 3.10) 



Theorem 3.5. Suppose that Assumptions 2.1 (ii) and (Hi) hold. Assume also 



that there exists an M g M such that <^(u) > M for any u € X . 

i) Let z^ = argmax^gjj- J^{z). There is a z £ E and a subsequence of {z^}s>o 

which converges to z strongly in X . 
ii) The limit z is a MAP estimator and a minimiser of T 

The proof of this theorem is based on several lemmas. We state and prove these 
lemmas first and defer the proof of Theorem |3.5| to the end of the section where we 
also state and prove a corollary characterising the MAP estimators as minimisers of 
Onsager-Machlup functional. 

Lemma 3.6. Let S > 0. For any centred Gaussian measure fiQ on a separable 
Banach space X we have 



where c — exp(^5^) and ai is a constant independent of z and S. 

Proof. Wc first show that this is true for a centred Gaussian measure on M" with 
the covariance matrix C = diag[Ai, . . . , A„] in basis {ei, . . . ,e„}, where Ai > -^2 > 
• • • > A„. Let Oj = 1/Aj, and |zp = zf + ■ ■ ■ + z^. Define 

4„(z) / ^-kiaixl + -+a„xl) ^j^y zeW, (3.4) 
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and with B^{z) the ball of radius 5 and centre z in M". We have 
4n(0) /^,(„^e-K'^i-H-+a„.2)^^ 

g-i(ai-e)(|^|-5)^ Q-h{exl + {a2-a^+e)xl + ■■■ + (a^-<^l+e)xl) 



< 



g— 5(01— e)<52 J ^ ^— 1 (e2;2^(a2— ai+e)a;|H h(a„ — ai+£)x2 ) 



/s^(0) Ao(dx)' 

for any e < ai and where /io is a centred Gaussian measure on R" with the Covariance 
matrix diag[l/e, 1/(02 — ai + e), • • • , l/(an — oi + e)] (noting that > an-i > ■ ■ ■ > 
fli). By Anderson's inequahty jio{B{z,S)) < jj,o{B{0,6)) and therefore 

-^"-"("^^ < ce-5('^i-^)(l^l-'5)' 

4n(0) 

and since e is arbitrarily small the result follows for the finite-dimensional case. 

To show the result for an infinite dimensional separable Banach space X, we first 
note that {ej}^^, the orthogonal basis in the Cameron-Martin space of X for ^q, 
separates the points in X, therefore T : u ^ injective map from X 

into . Let Uj — ej{u) and 

P„u = (ui,U2,--- ,M„,0, 0, •••). 

Then, since /xq is a Radon measure, for the balls B{0, S) and B{z, S), for any £0 > 0, 
there exists large enough N such that the cylindrical sets Aq = P,7"^(P„(i3*(0)) and 
= P-1(P„(P*(0)) satisfy ^o(^'^(0) A Ao) < Eq and fioiB^{z)AA,) < Eq for n > N 
[5]. Let Zj = {z, Cj) and z" = (zi, Z2, • • • , Zn, 0, • • • ) and for < ei < 6/2, n > N 
large enough so that \\z — z'^\\x < £1- With a ~ ce^^^'I^H^^'^i^''^ we have 

<a4{0) + {l + a)so. 

Since £0 and £1 converge to zero as n — > 00, the result follows. □ 

Lemma 3.7. Suppose that z ^ E, {z^}syo C X and z^ ^ z in X as 6 ^ 0. Then 
for any £ > there exists 6 small enough such that 

m 



Proof. Let C be the covariance operator of /j,Oj and {ej}j^fi the eigenfunctions of 
C scaled with respect to the inner product of E, the Cameron Martin space of /xq, 
so that {ejljgN forms an orthonormal basis in E. Let {Xj} be the corresponding 
eigenvalues and aj — 1/Aj. Since z*^ z as (5 — > 0, 

ej{z^) ej{z), for any j e N (3.5) 
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and as z ^ E, for any A > 0, there exists N sufficiently large and i5 > sufficiently 
small such that 




therefore 



where Xj = ej{z). By (3.51, for 8i < 5 small enough we have B^^iz^^) C B^{z) and 



inf 




(3.6) 



Let r„ : X 



map z to (ei(z), . . . , e„(z)), and consider Jq n(-^) be defined as 



in (3.4). Having (3.6 1, and choosing 5 < 5i such that e 4(''l^ haw)* ^^^'2^ for any 

g-KaiXjH hOna;^) J^; 



n > N we can write 



e 2 



Kaixf-i ha„2:2 ) 



< 



B^(T„z^) 



g-i(aix^H hajva;^)g-i(^£cjH h^2:^+ajv+ia;?,+i---+a„a;^) 

-i(ai2;J+...+ajv2;?,)e-5(Tr^? + --- + T^^N+«« + ia:J,^i---+a„2;2) 



e <i 



< 



e 2 



(^^iH h^2;jv+''N + l2;jv+i---+a„£c„) ^-^^ 



2 /j 



2 JB'5(0) 



< 2e"i'^ . 



As A > was arbitrary, the constant in the last line of the above equation can be 
made arbitrarily small, by making S sufficiently small and n sufficiently large. Having 
this and arguing in a similar way to the final paragraph of proof of Lemma |3.6[ the 
result follows. □ 

Corollary 3.8. Suppose that z ^ E. Then 



lim , , 



0. 



Lemma 3.9. Consider {z^}s>o C X and suppose that z^ converges weakly and 
not strongly to Q in X as (5 — )• 0. Then for any £ > there exists 5 small enough such 
that 



4(0) 



< e. 



Proof. Since z^ converges weakly and not strongly to 0, we have 



liminf IIz^IIa' > 

5^0 



and therefore for Si small enough there exists a > such that > « for any 

6 < 6i. Let Aj, Uj and ej, j e N, be defined as in the proof of Lemma 3.7 Since 



as (5 0, 



ej{z^) 0, for any j € N 



(3.7) 



Also, as for /ig-almost every x €i X , x = X^jgN ^^'^ i^j ~ 

orthonormal basis in X*^^ (closure of X* in L'^{hq)) [5], we have 



^^(ej(a::))^ < oo for ^o-£^l™ost every a; G X. 



(3.8) 



Now, for any A > 0, let iV large enough such that ajv > Then, having (3.7 1 and 



(3.8), one can choose 62 < 5i small enough and Ni > N large enough so that for 



6 < 62 and n > Ni 



E(e,(/))^ < ^, and ± {e,{z')f > ^. 

Therefore, letting Jg „(z) and r„ be defined as in the proof of Lemma 
write 



3.7 



4n(0) 



If 2 , , 2 \ 

-^(aix-^-l l-a„a;,J j^; 



< 



f 1-2;^) -i(aiXiH yaKX%+(aN + i-A^)x%,-^--- + {a„-A^)x^) j 

Ib^(O) e^'^*^^" + i + '"+'^"''e~5('^i=^?+---+aN^N + ('i«+i-A2)a;^+i--- + (a„-A2)a;JJ 



_ 1 42/ Cq 
g 2^2 



< 



J J 



l/fil 2, i^JV 2i 2 



"Ma; 



< 2c 



_C»^2 



if (5 < (52 is small enough so that e'^'^ < 2. Having this and arguing in a similar way 
to the final paragraph of proof of Lemma [3. 6[ the result follows. □ 

Having these preparations in place, we can give the proof of Theorem |3.5[ 

Proof, {of Theorem 3.5) i) We first show {z^} is bounded in X. By Assump- 



tion 



2.1 (ii) for any r > there exists K ~ K{r) > such that 

$(m) < K{r) 

for any u satisfying \\u\\x < r; thus K may be assumed to be a non-decreasing function 
of r. This implies that 

max/ e-*("Vo(dM) > / e-*("Vo(du) > e'^^^) / Mo(du). 



We assume that 6 < 1 and then the inequahty above shows that 



noting that Si is independent of S. 
We also can write 



^'^"'^ >4e-^W=ei (3.9) 



JBHz) 



JB^(z) 

which imphes that for any z Cz X and S > 

4{z) > Ze^'j\z) (3.10) 

Now suppose {z^^ is not bounded in X, so that for a n y R > there exists 5r such 
that Wz^'^Wx > R (with Sr ^ as R ^ oo). By ( |3.10[ ), ([sjj) and definition of z'^^ we 
have 

implying that for any 5r and corresponding z^'^- 

4«(o) - 

This contradicts the result of Lemma |3.6| (below) for Sr small enough. Hence there 
exists R, 6r, > such that 

\\z^\\x < R, for any S < Sr. 

Therefore there exists a. z £ X and a subsequence of {z^}o<s<Sii which converges 
weakly in X to z G X as S ^ 0. 

Now, suppose either 

a) there is no strongly convergent subsequence of {z^} in X, or 

b) if there is one, its limit z is not in E. 

Let Ue = {u G E : \\u\\e < !}• Each of the above situations imply that for any 
positive ^ e E, there is a such that for any S < S\ 

B\z^)n{B\0) + AUE) ^D. (3.11) 
We first show that, z has to be in E. By definition of z^ we have (for (5 < 1) 

- J^(0) -e-^d) ^.(„)/io(d«) ^''■''> 
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Supposing z ^ in Lemma |3.7| we show that for any e > there exists 5 small 
enough such that 



< e. 



Hence choosing A in (3.11) such that e < ^e'^^'-^'e 



■M 



and setting e = e 



from (3.121, we get 1 <r\z^)lr{G) < 1 which is a contradiction. We therefore have 

ze E. 

Now, knowing that z € E, we can show that the z^ conv erges strongly in X. 
Suppose not. Then for zJ - z the hypotheses of Lemma |3.9| are satisfied. Again 
choosin g A in ( 3.11 ) such that e^^^^^ < ^e^^^^e"^, and setting e = e"-^^/^, from 
Lemma 3.9 and (TU), we get 1 < J\z^)/J^{0) < 1 which is a contradiction. Hence 
there is a subsequence of {z*} converging strongly in X to z £ E. 



ii) Let z* = argmin/(z) e E; existence is assured by Theorem 3.2 By Assump- 
tion |2.1| (iii) we have 



with Li = L{\\z^\\x + S) and L2 — L{\\z\\x + S). Therefore, since $ is continuous on 
X and z in X, 



< nm sup — ^ 



Aio(du) 



hmsup ^.....^^j^ 

Suppose {z^} is not bounded in E or if it is, it only converges weakly (and not strongly) 
in E. Then \\z\\e < liminf^-yo II-^'^Hb and hence for small enough S, \\z\\e < H^'^Hb- 
Therefore for the centered Gaussian measure ^0, since — z\\x — )■ we have 

/B6(^5)Aio(dM) 



lim sup ■ 



< 1. 



5^0 /3,(g)Aio(du) 
This, since by definition of z^ , J^{z^) > J^{z) and hence 



implies that 



liminf(j^(z*)/J*(z-)) > 1, 



5^0 J^{z) 



(3.13) 



In the case where {z^} converges strongly to z in E, by the Cameron-Martin Theorem 
we have 



Mo(du) 



iB^iz) Mo(dM) 



/B.(o)e<^"'">-/io(d^/) 



e 2 
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and then by an argument very similar to the proof of Theorem 18.3 of [21] one can 
show that 



lim 



and (3.13 1 follows again in a similar way. Therefore z is a MAP estimator of measure 
A*- 

It remains to show that z is a minimiser of /. Suppose z is not a minimiser of / 
so that I{z) — I{z*) > 0. Let Si be small enough so that in the equation before (3.2 1 
1 < ri{S) < e^^^)"^*^^ ^ for any S < Si and therefore 



J'iz) 



<ri(5)e-^("")+^("') < 1. 



J^{z*) 

Let a = ri(5)e--^(*)+^(^*). We have 

J^{z^) J^{z^) J^{z) 



(3.14) 



J^{z*) J^{z) J^{z*) 
and this by (3.14) and (3.13) implies that 

1™^^P isi *\ - " limsup 

s^o J''[z*) s~^Q ■Pkz) 



< 1, 



which is a contradiction, since by definition of zJ , J^{zJ) > J^{z*) for any 5 > 0. □ 
Corollary 3.10. Under the conditions of Theorem\3.^ we have the following: 



i) Any MAP estimator, given by Definition 3.1 , minimises the Onsager-Machlup 
functional I. 

a) Any z* G E which minimises the Onsager-Machlup functional I, is a MAP 
estimator for measure fj, given by (1.1). 

Proof. 

i) Let z be a MAP estimator. By Theorem 3.5 we know that {z^} has a subse- 
quence which strongly converges in X to z. Let {z"} be the said subsequence. 
Then by (3.13) one can show that 

hm ^ hm . 1. 

By the above equation and since z is a MAP estimator, we can write 
J\~z) 



lim 



lim m linr ''^"^ 



s^o J^{z) <5-*0 J^{z) s^o J^{z^) 



1. 



Then Corollary |3.8| implies that z £ E, and supposing that z is not a min- 
imiser of / would result in a contradiction using an argument similar to last 
paragraph of the proof of the above theorem, 
ii) Note that the assumptions of Theorem |3.5| imply those of Theorem 3.2 Since 
z is a minimiser of / as well, by Theorem |3.2| we have 
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1. 



Then we can write 

J\z*) J'{z) J'jz*) 

1™ = lim = Imi ,_, 

5^0 J^{z^) s^a J^{z^) s^a J^{z) 

The result follows by Definition |3.1[ 



4. Bayesian Inversion and Posterior Consistency. The structure (1.1), 
where /iq is Gaussian, arises in the application of the Bayesian methodology to the 
solution of inverse problems. In that context it is interesting to study posterior con- 
sistency: the idea that the posterior concentrates near the truth which gave rise to 
the data, in the small noise or large data limits; these two limits are intimately related 
and indeed there are theorems that quantify this connection for certain linear inverse 
problems [65 . In this section we describe the Bayesian approach to nonlinear inverse 
problems, and then study posterior consistency of MAP estimators in both the small 
noise and large data limits, as described in Theorems |4 . 4| and |4 . 1 1 respectively. Specifi- 
cally we characterize the sense in which the MAP estimators concentrate on the truth 
underlying the data in the small noise and large data limits. 

4.1. Inverse Problems. Consider the problem of estimating a function u in a 
Banach space X, from a given vector y G M'^, where 

y = G{u) + C\ (4.1) 

here G: X R'-' is a possibly nonlinear operator, and C is a realization of an R''- 
valued centred Gaussian random variable with known covariance S. A prior proba- 



bility measure /xo(dM) is put on u, and the distribution of y\u is given by (4.1), with 
C assumed independent of u. In this paper we restrict ourselves to centred Gaussian 
prior measures, and denote the covariance operator of /xq by Cq: /io — Af(0,Co). Then, 
under appropriate conditions on /xg and G, Bayes theorem is interpreted as giving the 
following formula for the Radon-Nikodym derivative of the posterior distribution 
on u\y with respect to /iq: 

-^(u)aexp(-$(u;zj)), (4.2) 

where 

<f{u;y)^^\j:-^y^G{u))\\ (4.3) 



Derivation of Bayes formula (4.2) for problems with finite dimensional data, and C in 
this form, is discussed in [7 . Clearly, then, Bayesian inverse problems with Gaussian 
priors fall into the class of problems studied in this paper, for potentials $ given by 



(4.3) which depend on the observed data y. In this section we study MAP estimators 



for problems of the form (4.2). It is convenient to extend the Onsager-Machlup 
functional / to a mapping from X x M"' to M, defined as 

^1 



We study properties of minimisers of the functional, in both the small noise and in 
the large data limits. We assume that the data is found from application of G to the 
truth with additional noise: 



y = G(«t) + c. 
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Theorems |4 . 1 1 and |4 . 4| show that weak hmits of MAP estimators agree, when mapped 
by G, with the image of the true function under G. 



4.2. Large Data Limit. Let us denote the exact solution by and suppose 
that as data wc have the following n random vectors 

Vj = g{u^) j = l,...,n 

with Uj G M.^ and rjj ^ A/'(0, Ci) independent identically distributed random variables. 
Thus, in the general setting, we have J = nK, G(-) = • • • ,G{-)) and S a block 

diagonal matrix with Ci in each block. We have n independent observations each 
polluted by 0(1) noise, and we study the limit n — >• cx). Corresponding to this set of 
data and given the prior measure //q ~ A/'(0, Co) we have the following formula for the 
posterior measure on u: 

— — — (u)ocexp \-^2^Jyj-G{u)\l^ 
Here, and in the following, we use the notation (•, •)^^ — /Ci^^^-,Ci^^'^\ and | • 



(•, By Corollary 3.10 MAP estimators for this problem are minimisers of 

n 

In:^\\urE+Y,\y,-g{u)\l. (4.4) 

Our interest is in studying properties of the limits of minimisers u„ of /„, namely the 
MAP estimators corresponding to the preceding family of posterior measures. We 
have the following theorem concerning the behaviour of m„ when n oo. 

Theorem 4.1. Assume that Q : X —i' is Lipschitz on bounded sets and G 



E. For every rt G N, let G E he a minimiser of In given by (4.4). Then there 
exists a u* G E and a subsequence of {un}n<£N that converges weakly to u* in E, 
almost surely. For any such u* we have Gin*) — G{u^)- 

We describe some preliminary calculations useful in the proof of this theorem, 
then give Lemma |4.2[ also useful in the proof, and finally give the proof itself. 



We first observe that, under the assumption that Q is Lipschitz on bounded sets, 
Assumptions |2.l| hold for $. We note that 



in = \\u\\l + J2\y,~Gi^}\l 



3 = 1 



Hence 



argmin/„ = argmin<{ \\u\\l + n\g{u'f) g{u)\l^ + 2 ^(^(m^) - ^^(u), Cf^/^) 
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Define 

1 9 " 

Uu) = \g{u^) - g{u)\i + ^\\u\\i + - Y,{g{u^) - g{u),c^'v,)- 

We have 

arg min /„ = arg min J„ . 

u u 

Lemma 4.2. Assume that Q : X M.^ is Lipschitz on bounded sets. Then for 
fixed n (z N and almost surely, there exists Un (z E such that 

JniUn) = inf J„(u). 



Proof We first observe that, under the assumption that Q is Lipschitz on bounded 
sets and because for a given n and fixed reahsations 771 , ... , r/„ there exists an r > 
such that max{|yi|, . . . , |?/„|} < r, Assumptions 2.1 hold for Since argmin^ /„ = 
argmin„ J„ the result follows by Proposition 3.4 □ 



We may now prove the posterior consistency theorem. From (4.6) onwards the 
proof is an adaptation of the proof of Theorem 2 of [3] ■ 



Proof, (of Theorem 4.I) By definition of u„ we have 

\Qiu^)-Q{Un)\l + hun\\l+lj2(^{u^)-g{Un),C^'v,) < ku^Wl- 



3 = 1 



Therefore 



\g{u^)~giun)\i + -hn\\i < + ^ l^("^)-^K)klEc^ 



l/2 



Using Young's inequality for the last term in the right-hand side we get 



Taking expectation and noting that the {r]j} are independent, we obtain 



lE\g{u^)-g{u,,)\l + -E\\ujl < -\\u^\ 

2 ^ n n 

-1/2,, 12 



2K 
n 



where if = E|Cj ' r/ip. This imphes that 



and 



E|e?(M^) -a(M„)|ci ^ as n^oo 



E\\un\\l < \\u^\\l + 2K. 
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(4.5) 



(4.6) 



1) We first show using (4.6) that there exist u* ^ E and a subsequence {un^(k)}ken 
of such that 

E{un^{k),v)E ^K{u*,v)e, for any ue^;. (4.7) 
Let {(pj}j,£tq be a complete orthonormal system for E. Then 
E(w„, (I)i)e 

Therefore there exists e M and a subsequence {w„i(fe)}fceN of {u„}„gN, such that 
E(Uni(fc),0i) — > Ci- Now considering E(u„j(fe), </)2) and using the same argument we 
conclude that there exists ^2 € K and a subsequence {wra2(fc)}fcGN of {wni(/c)}fceN such 
that E(M„2(fc) , ^2) — >■ Continuing similarly we can show that there exist G M.°° 
and {u„i(fe)}feGN 15 {u„2(fc)}feeN D • • • D {u„,(fe)}feGN such that E(u„^.(fe), ^ for 
any j G N and as A; — )• 00. Therefore 

E(u„^(fc), ^ ^j, as fc ^ 00 for any j G N. 

We need to show that {^j} G ^^(IR). We have, for any N eN, 

N N 

Y^^j < lim E^(u„,(fe)>,)| < limsupE||w„,(,)||| < \\u^\\l + 2K. 
Therefore G i!^(M) and m* := J^jLi ^j'Pj G We can now write for any nonzero 

V e E 



00 

<N\\v\\eE sup |K,(fe)-7.*,0j)£| + (|hit||| + 2i^)i/2^ 

Now for any fixed e > we choose N large enough so that 

00 

(|l^.t|l| + 2if)V2^|(^;,0^.)^|<-e 

j=N 

and then fc large enough so that 

N\\v\\EE\{un,(k)-u*,<Pj)E\ < \e for any 1 < j < iV. 

This demonstrates that E(w„j_(^,-) — u*, v)e — > as fc — cx). 

2) Now we show almost sure existence of a convergent subsequence of \u^^{^^^^. 
By (4.5) we have |5(u„j.(fc)) — Q(u^)\ci in probability as fc — )■ 00. Therefore there 
exists a subsequence {um(fe)} of {u„^.(j,)} such that 

Qk^mik)) ^ Q^^) a.s. as /c — 7> 00. 



Now by (4.7) we have {um{k) — u*, w)^ — > in probability as fc — > 00 and hence there 
exists a subsequence {um(fc)} of {um(fe)} such that 



%{k) u* in E a.s. as fc — > 00. 
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Since E is compactly embedded in X, this implies that Ufn(k) u* in X almost surely 
as fc — )■ oo. The result now follows by continuity of Q. U 

In the case that E X (and not necessarily in E), we have the following weaker 
result: 

Corollary 4.3. Suppose that Q and u„ satisfy the assumptions of Theorem \4.1\ 
and that E X. Then there exists a subsequence of {Q{un)}nef>i converging to GJu') 
almost surely. 

Proof. For any e > 0, by density of E in X, there exists v E E such that 
IIm^ — < £• Then by definition of m„ we can write 



1 9 " 



1 9 " 





Therefore, dropping ^||un||l; in the left-hand side, and using Young's inequality we 
get 

By local Lipschitz continuity of g, \g{u^) — ^(w)|ci < Ce^, and therefore taking the 
expectations and noting the independence of {rij} we get 

E\g{u^)~g{u,,)\l<4Ce' + ^ + ^, 

implying that 

limsupE|^;(u^) - g{un)\l^ < ACe^. 

n— ^oo 

Since the liminf is obviously positive and e was arbitrary, we have lim„_>.oo — 
^("n)lci ~ 0. This implies that — g{'u,n)\ci ^ in probability. Therefore there 

exists a subsequence of which converges to g{u)) almost surely. □ 

4.3. Small Noise Limit. Consider the case where as data we have the random 
vector 

Vn = g{v)) + -Vn, (4.8) 

n 

for n E N and with again as the true solution and rjj ^ Af{0,Ci), j E N, Gaussian 
random vectors in M^. Thus, in the preceding general setting, we have G — g and 
J = K. Rather than having n independent observations, we have an observation 
noise scaled by small 7 = 1/n converging to zero. For this data and given the prior 
measure fiQ on u, we have the following formula for the posterior measure: 



(u) oc exp |2/„ - ^(w)lci j 
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By the result of the previous section, the MAP estimators for the above measure are 
the minimisers of 

I^{u):^\\u\\l+n'\y^-g{u)\l^. (4.9) 

Our interest is in studying properties of the hmits of minimisers of /„ as ?i ^ oo. We 
have the following almost sure convergence result. 

Theorem 4.4. Assume that Q: X ^ is Lipschitz on bounded sets, and 
£ E. For every n G N, let u„ £ E be a minimiser of In{u) given by (4.9). Then 
there exists a u* £ E and a subsequence of {un}neN that converges weakly to u* in 
E, almost surely. For any such u* we have G{u*) = G{u"^)- 



Proof. The proof is very similar to that of Theorem |4.1| and so we only sketch 
differences. We have 



Ml + n''\g{u^) + \^-g[u)\l^ 



Letting 



we hence have argmin^ /„ = argmin„ J„. For this J„ the result of Lemma 4.2 holds 
true, using an argument similar to the large data case. The result of Theorem |4.4 
carries over as well. Indeed, by definition of it„, we have 

Therefore 

\g{u^) - t?K)lc, + \.M\?E < ^MWl + ^ \G{u^) - G{un)\c, |cr%n|. 

Using Young's inequality for the last term in the right-hand side we get 

l\giu^)~gM\l + ^\\ujl < ^\\u^\\l + \\c-'/'vn\'- 

Zj Tl Til TJi 



2K 

n 2 



Taking expectation we obtain 

E\giu^)-gM\l + ^E\\ujl < ^Wu^wl 

This implies that 

E\g{u'')-g{un)\l^^O as n^oo (4.10) 

and 

E\\un\\l<\\u^\\l + 2K. (4.11) 
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Having (4.101 and (4.111, and with the same argument as the proof of Theorem 4.1 
it follows that there exists a m* G £' and a subsequence of {u„} that converges weakly 
to u* in E almost surely, and for any such u* we have G{u*) = Giu^)- D 

As in the large data case, here also if we have E X and we do not restrict the 
true solution to be in the Cameron-Martin space E, one can prove, in a similar way to 
the argument of the proof of Corollary |4.3[ the following weaker convergence result: 

Corollary 4.5. Suppose that Q and Un satisfy the assumptions of Theorem \4.4\ 
and that G X. Then there exists a subsequence of {Q{un)}neti converging to G{u^) 
almost surely. 

5. Applications in Fluid Mechanics. In this section we present an application 
of the methods presented above to filtering and smoothing in fluid dynamics, which 
is relevant to data assimilation applications in oceanography and meteorology. We 
link the MAP estimators introduced in this paper to the variational methods used in 
applications [5], and we demonstrate posterior consistency in this context. 

We consider the 2D Navier-Stokes equation on the torus := [—1, 1) x [—1, 1) 
with periodic boundary conditions: 

dtV-iyAv + v-Wv + Wp = f for ah (x, t) G x (0, oo), 
V-w =0 for ah e T2 X (0,oo), 

V = M for all (a;,i) e T2 X {0}. 

Here w : x (0, oo) — > is a time-dependent vector field representing the velocity, 
p: T'^x (0, oo) — >■ E is a time-dependent scalar field representing the pressure, / : — >■ 
is a vector field representing the forcing (which we assume to be time-independent 
for simplicity), and v is the viscosity. We are interested in the inverse problem of 
determining the initial velocity field u from pointwise measurements of the velocity 
field at later times. This is a model for the situation in weather forecasting where 
observations of the atmosphere are used to improve the initial condition used for 
forecasting. For simplicity we assume that the initial velocity field is divergence-free 
and integrates to zero over T^, noting that this property will be preserved in time. 



Define 

Ti := ■!. trigonometric polynomials u: 



0, / u{x)dx = o\ 



and H as the closure of H with respect to the (L^(T^))^ norm. We define the map 
P: (L^(T^))^ — > to be the Leray-Helmholtz orthogonal projector (see [M])- Given 
k = {ki, k2)^ , define k^ :— (fc2, — fci)""". Then an orthonormal basis for H is given by 
7/;^: ]R2 ^ where 



^k{x) ■■= 1^ exp^^TTzfc • x^ 
for k e1? \ {0}. Thus for u £ H we may write 



u= ^ Uk{t)tpk{x) 

fcGZ2\{0} 
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where, since m is a real-valued function, we have the reality constraint u^k = —Uk- 
Using the Fourier decomposition of u, we define the fractional Sobolev spaces 



feez2\{o} 

with the norm ||u||^ (Efe('^^|fcP)1^*feP)^^^ where s e M. UA^ -PA, the Stokes' 
operator, then iJ* = D{A'^/^). We assume that / e for some s > 0. 

Let ti = £h, for ^ = 0, . . . , L, and define Vi £ M*^ be the set of pointwise values 
of the velocity field given by {v{xm,ti)}m&M where M is some finite set of point 
in with cardinality M/2. Note that each Vi depends on u and we may define 
Qi: H ^ M.^ by Ge{u) = vt- We let {?7f }^g{i,...,L} be a set of random variables in 
which perturbs the points {vi\ii^{i^,,,^L} to generate the observations {yt}iiz{i....,L} in 
M*^ given by 

yi := ve + ^rje, ^€{1,...,L}. 

We let y — {yeyf^i, the accumulated data up to time T — Lh, with similar notation for 
rj, and define Q : H ^ M}^^ by Q{u) — {Qi{u), . . . , Gl{u)). We now solve the inverse 
problem of finding u from y = Q{u) + 777. We assume that the prior distribution 
on u is a Gaussian ^0 = ^(0,Co), with the property that ^o{H) = 1 and that the 
observational noise {?7£}^g{i,...,L} is i.i.d. in M*^, independent of u, with 771 distributed 
according to a Gaussian measure N{0,I). If we define 

then under the preceding assumptions the Bayesian inverse problem for the posterior 
measure /x^ for u\y is well-defined and is Lipschitz in y with respect to the Hellinger 
metric (see [7]). The Onsager-Machlup functional in this case is given by 



We are in the setting of subsection 4.3 with j — l/n and K ~ ML. In the ap- 
plied literature approaches to assimilating data into mathematical models based on 
minimising Jns are known as variational methods, and sometimes as 4DVAR [2 . 

We now describe numerical experiments concerned with studying posterior con- 
sistency in the case 7 — >■ 0. We let Cq — A^^ noting that if u ~ /iq, then u e -ff* 
almost surely for all s < 1; in particular u ^ H. Thus ^a{H) = 1 as required. The 
forcing in / is taken to be / = V''^^', where ^' = cos(7rfc • x) and — JV with J the 
canonical skew-symmetric matrix, and k = (5,5). The dimension of the attractor is 
determined by the viscosity parameter v. For the particular forcing used there is an 
explicit steady state for all > and for v > 0.035 this solution is stable (see [52], 
Chapter 2 for details). As 1/ decreases the flow becomes increasingly complex and we 
focus subsequent studies of the inverse problem on the mildly chaotic regime which 
arises for 1/ = 0.01. We use a time-step of St = 0.005. The data is generated by com- 
puting a true signal solving the Navier-Stokes equation at the desired value of v, and 
then adding Gaussian random noise to it at each observation time. Furthermore, we 
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u -u / u 



|G(u*)-G(u")|/|G(u") 
|G(u*)-y|/|y| 




Fig. 5.1. Illustration of posterior consistency in the fluid mechanics application. The three 
curves given are the relative error of the MAP estimator u* in reproducing the truth, u'^ (solid), the 
relative error of the map Q{u*) in reproducing Q{u^) (dashed), and the relative error of Q{u*) with 
respect to the observations y (dash-dotted). 



let h = 4:St^ 0.02 and take L = 10, so that T = 0.2. We take M = 32^ spatial obser- 
vations at each observation time. The observations are made at the gridpoints; thus 
the observations include all numerically resolved, and hence observable, wavenumbers 
in the system. Since the noise is added in spectral space in practice, for convenience 
we define cr — j/ \/M and present results in terms of a. 

Figure |5.1| illustrates the posterior consistency which arises as the observational 
noise strength 7 — )> 0. The three curves shown quantify; (i) the relative error of the 
MAP estimator u* compared with the truth, ; (ii) the relative error of G {u* ) com- 
pared with G{u^)', and (iii) the relative error of G{u*) with respect to the observations 
y. The figure clearly illustrates Theorem 4.4 via the red curve for (ii), and indeed 



shows that the map estimator itself is converging to the true initial condition, via the 
blue curve, as 7 0. Recall that the observations approach the true value of the 
initial condition, mapped forward under 5, as 7 — 0, and note that the pink curve 
shows that the image of the MAP estimator under the forward operator G, G(u*), is 
closer to G{u^) than y, asymptotically as 7 0. 



6. Applications in Conditioned Diffusions. In this section we consider the 
MAP estimator for conditioned diffusions, including bridge diffusions and an applica- 
tion to filtering/smoothing. We identify the Onsager-Machlup functional governing 
the MAP estimator in three different cases. We demonstrate numerically that this 
functional may have more than one minimiser. Furthermore, we illustrate the results 
of the consistency theory in section [4] using numerical experiments. Subsection 6.1 
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concerns the unconditioned case, and includes the assumptions made throughout. 



Subsections 6.2 and 6.3 describe bridge diffusions and the filtering/smoothing prob- 



lem respectively. Finally, subsection |6.4| is devoted to numerical experiments for an 
example in filtering/smoothing. 

6.1. Unconditioned Case. For simplicity we restrict ourselves to scalar pro- 
cesses with additive noise, taking the form 



du = f{u)dt + adW, M(0)=it" 



(6.1) 



If we let 1/ denote the measure on X :— C([0,r];M) generated by the stochastic 
differential equation (SDE) given in (6.11, and vq the same measure obtained in the 
case / = 0, then the Girsanov theorem states that v -^vq with density 



dz/ 



-(u)=exp(--i^ / \f{u{t))\^ dt+\ f f{u{t))du{t)). 

'O ^ ^cr Jo ^"^ Jo ' 

M with F' [u] = f{u), then an application of Ito's formula 



If we choose an F 
gives 



dF{u{t)) = f{u{t)) du(t) + —r{u{t)) dt, 
and using this expression to remove the stochastic integral we obtain 



Av 
di/Q 



{u) oc exp(- 2^ ^ I' + '^'f'Ht))) dt + ^F{uiT))) . (6.2) 



Thus, the measure v has a density with respect to the Gaussian measure i^o and (6.2 1 



takes the form ( 1.1 ) with fi = v and fiQ = i^q: we have 



{u) cx exp(— (f>i(w)) 



where <i>i : X — M is defined by 



$i(w)= / ^{u{t))dt- —F{u{T)) 



and 



(6.3) 

^iu) = ^{\fiu)f+a'f'iu)). 
We make the following assumption concerning the vector field / driving the SDE: 



Assumption 6.1. The function f — F' in (6.1) satisfies the following conditions. 

1. F e C2(M,M) for all u£R. 

2. There is M G R such that *(u) > M for all u e R and F{u) < M for all 
ueR. 
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Under these assumptions, we see that $1 given by ( |6.3[ ) satisfie s As sumptions 2.1 
and, indeed, the shghtly stronger assumptions made in Theorem 3.5 Let H^[0,T] 



denote the space of absolutely continuous functions on [0,T]. Then the Cameron- 
Martin space El for i/q is 

El = ^^v e H^[0,T] |i;'(s)|^ds < cx) and w(0) = 0| 
and the Cameron-Martin norm is given by 

where 



Hh- - ( \v'{s)\Us 



The mean of vq is the constant function m = and so, using Remark 2.2 we 
see that the Onsager-Machlup functional for the unconditioned diffusion (6.1 ) is thus 
/i : iJi — 7- M given by 



I,{u) = <i>i{u) + ^\\ 



1^1 =$iH + ^||u| 
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Together, Theorems |3.2| and |3.5| tell us that this functional attains its minimum over 
E[ defined by 

E[ = ^^v e H^[0,T] |u'(s)|^ds < 00 and w(0) = j. 
Furthermore such minimisers define MAP estimators for the unconditioned diffu- 



sion (6.1), i.e. the most likely paths of the diffusion. 



We note that the regularity of minimisers for Ii implies that the MAP estimator 
is C^, whilst sample paths of the SDE (6.11 are not even differentiable. This is 
because the MAP estimator defines the centre of a tube in X which contains the 
most likely paths. The centre itself is a smoother function than the paths. This is a 
generic feature of MAP estimators for measures defined via density with respect to a 
Gaussian in infinite dimensions. 

6.2. Bridge Diffusions. In this subsection we study the probability measure 
generated by solutions of (6.1), conditioned to hit m+ at time 1 so that u{T) = u+. 



and denote this measure /i. Let /^o denote the Brownian bridge measure obtained in 
the case / = 0. By applying the approach to determining bridge diffusion measures 



in |15j we obtain, from (6.2), the expression 



d/z 



(u) cx exp 



*(M(t)) dt 



(6.4) 



Since u+ is fixed we now define $2 : X — > M by 

$2(m)= / ^{u{t))dt 
Jo 
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and then (6.4 1 takes again the form (1.1). The Cameron-Martin space for the (zero 
mean) Brownian bridge is 

E2 ^ e H^[0,T] |u'(s)|^ds < 00 and w(0) = 'y(r) = o| 

and the Cameron-Martin norm is again cr^^ || • . The Onsager-Machlup function 
for the unconditioned diffusion (6.1 1 is thus /2 : ~^ ^ given by 

h{u) = ^2{u) + ^\\u - 7n\\]J^ 

where to, given by ■m{t) = ^^u^ + for ah t £ [0, T] , is the mean of /io and 

E'2 ^ ^^v e H^[0,T] |t;'(s)|^ds < 00 and w(0) u^,u(T) = u+|. 

The MAP estimators for fi arc found by minimising I2 over E!^. 

6.3. Filtering and Smoothing. We now consider conditioning the measure v 
on observations of the process u at discrete time points. Assume that we observe 
y € M"' given by 



(6.5) 



where < ti < ■ ■ ■ < tj < T and the rjj are independent identically distributed ran- 
dom variables with rjj ^ N{0, 7^). Let Qo(dy) denote the M'^-valued Gaussian measure 
Af(0,7^/) and let Q{dy\u) denote the M'^-valued Gaussian measure N{Ou,^'^I) where 
G: X ^R-' is defined by 

gu = {u{ti),--- ,u{tj)). 

Recall i>Q and v from the unconditioned case and define the measures Pq and P on 
X X M'^ as follows. The measure Po(du,dj/) = i'o(du)Qo(d?/) is defined to be an 
independent product of 1^0 and Qo, whilst P(du, dy) = i'{du)Q{dy\u). Then 



dP 

dp;; 



exp(- / ^{u{t)) dt + \f{u{T)) -^Y. - 



with constant of proportionality depending only on y. Clearly, by continuity, 

inf exp(- r ^>{u{t))dt+\F(u{T))-^Y.\yi~<^i')\^) 
and hence 

/ cxp(- r ^{u{t))dt+\F{uiT)) - ^J2\y^ - u{t,)\') Mdu) > 
J||m||x<i ^ Jo o- / 

Applying the conditioning Lemma 5.3 in [15J then gives 



0. 



^(u) cc exp(- ^ ^{u{t)) dt + ^F{uiT)) - ^ E 1% - uit 
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Fig. 6.1. Illustration of the problem of local minima of I for the smoothing problem with a 
small number of observations. The process u{t) starts at n(0) = — 1 and moves in a double-well 
potential with stable equilibrium points at —I and +1. Two observations of the process are indicated 
by the two black circles. The curves correspond to four different local minima of the functional I3 
for this situation. 

Thus we define 

T J 

^,{u)^ f ^{u{t))dt-\F{u{T)) + ^^Y.\y^~'<^^)\''- 
Jo ^7 

The Cameron-Martin space is again Ei and the Onsager-Machlup functional is thus 
I3: E[ R, given by 

I,{u)^^s{u) + ^\\u\\j,,. (6.6) 

The MAP estimator for this setup is, again, found by minimising the Onsager-Machlup 
functional I3. 

The only difference between the potentials $1 and $3, and thus between the 
functionals Ii for the unconditioned case and I3 for the case with discrete observations, 
is the presence of the term 2^ J2j=i IVj ~ '"(^jOP- the Euler-Lagrange equations 
describing the minima of , this term leads to Dirac distributions at the observation 
points ti, . . . ,tj and it transpires that, as a consequence, minimisers of have jumps 
in their first derivates at ti, . . . ,tj. This effect can be clearly seen in the local minima 
of I3 shown in figure |6.1[ 

6.4. Numerical Experiments. In this section we perform three numerical ex- 
periments related to the MAP estimator for the filtering/smoothing problem presented 
in section [6731 
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Fig. 6.2. Illustration of posterior consistency for the smoothing problem in the small-noise 
limit. The marked points correspond the maximum-norm distance between the true signal u'^ and 
the MAP estimator with J = 5 evenly spaced observations. The map G{u) = (tt(ti), . . . , u{tj)) is 
the projection of the path onto the observation points. The solid line is a fitted curve of the form cy. 



For the experiments we generate a random "signal" by numerically solving the 



SDE (6.1 ), using the Euler-Maruyama method, for a double- well potential F given by 



F{u) 



(1 -U)2(l+U)2 



1 



with diffusion constant a — 1 and initial value u = — 1. From the resulting solution 
u{t) we generate random observations yi, . . . ,yj using (6.5). Then we implement the 



Onsager-Machlup functional from equation (6.6 1 and use numerical minimisation 



employing the Broyden-Fletcher-Goldfarb-Shanno method, to find the minima of I3. 

The first experiment concerns the problem of local minima of I3 . For small num- 
ber of observations we find multiple local minima; the minimisation procedure can 
converge to different local minima, depending on the starting point of the optimisa- 
tion. This effect makes it difficult to find the MAP estimator, which is the global 
minimum of /a, numerically. The problem is illustrated in figure 6.1 which shows 
four different local minima for the case of J = 2 observations. One would expect this 
problem to become less pronounced as the number of observations increases, since 
the observations will "pull" the MAP estimator towards the correct solution, thus 
reducing the number of local minima. This effect is confirmed by experiments: for 
larger numbers of observations our experiments found only one local minimum. 

The second experiment concerns posterior consistency of the MAP estimator in 
the small noise limit. Here we use a fixed number J of observations of a fixed path 
of (6.1 1, but let the variance 7^ of the observational noise r/j converge to 0. Noting 
that the exact path of the SDE, denoted by in (4.8), has the regularity of a 
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Fig. 6.3. Illustration of posterior consistency for the smoothing problem in the large-data limit. 
The marked points correspond the supremum-norm distance between the true signal u* and the MAP 
estimator Mj with J evenly spaced observations. The solid line give a fitted curve of the form cj~'^ ; 
the exponent o = — 1/4 was found numerically. 



Brownian motion and therefore the observed path is not contained in the Cameron- 
Martin space i?3, we are in the situation described in CoroUary|4.5| Our experiments 



indicate that we have Q{u-y) — > Q{v)) as 7 J, 0, where denotes the MAP estimator 



v2 



corresponding to observational variance 7 , confirming the result of Corollary 4.5 



The result of a simulation with J = 5 is shown in figure 6.2 



Finally, we perform an experiment to illustrate posterior consistency in the large- 
data limit: for this experiment we still use one fixed path of the SDE (6.1 ). Then, 



for different values of J, we generate observations yi, . . . ,yj using (6.51 at equidis- 
tantly spaced times ti, . . . , i j, for fixed 7 = 1, and then determine the L'^ distance of 
the resulting MAP estimate uj to the exact path u'' . The situation considered here is 
not covered by the theoretical results from section |4] but the results of the numerical 
experiment, shown in figure [673| indicate that posterior consistency still holds. 
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